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We investigate the effect that the choice of measurement scale 
has upon inference and extrapolation in extreme value analysis. Sep- 
arate analyses of variables from a single process on scales which are 
linked by a nonlinear transformation may lead to discrepant conclu- 
sions concerning the tail behavior of the process. We propose the 
use of a Box-Cox power transformation incorporated as part of the 
inference procedure to account parametrically for the uncertainty sur- 
rounding the scale of extrapolation. This has the additional feature 
of increasing the rate of convergence of the distribution tails to an 
extreme value form in certain cases and thus reducing bias in the 
model estimation. Inference without reparameterization is practica- 
bly infeasible, so we explore a reparameterization which exploits the 
asymptotic theory of normalizing constants required for nondegener- 
ate limit distributions. Inference is carried out in a Bayesian setting, 
an advantage of this being the availability of posterior predictive re- 
turn levels. The methodology is illustrated on both simulated data 
and significant wave height data from the North Sea. 



1. Introduction. The usual objective of extreme value analysis is to use 
sample data from rare events of a process to make rational predictions about 
the likely levels of future extremes of the process. To do this, one models 
extreme data using an asymptotically justified probability model. The most 
fundamental such example is the generalized extreme value (GEV) distri- 
bution. The GEV arises as the limiting law for appropriately normalized 
maxima of independent and identically distributed random variables, under 
weak conditions discussed in Section 2; it is a three parameter distribution 
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with distribution function 



G{x) = exp 



{ 




where /i, a > 0, ^ are respectively location, scale and shape parameters, and 
=max{0, z}. This distribution is herein denoted GEV(/i, cr, ^). The cases 
^ > 0, = (interpreted as the limit ^ ^ 0) and ^ < are sometimes referred 
to as the Frechet, Gumbel and Negative Weibull types, respectively. Other 
approaches to modeling extreme data are discussed in Section 3. Mathemat- 
ical details of univariate extreme value theory for stationary processes are 
covered extensively in Leadbetter, Lindgren and Rootzen (1983), while more 
statistical aspects are treated in, for example. Coles (2001). 

There are many applications of extreme value analysis where data per- 
taining to the same physical process may naturally be measured on more 
than one scale. If the transformation between measurement scales is linear, 
the appropriate type of extreme value distribution remains unaltered. If, on 
the other hand, a nonlinear transformation is applied, different limiting dis- 
tributions may be appropriate. Applying extreme value methods to the data 
on these different scales can lead to disparate conclusions regarding future 
extremes. This paper proposes methodology which allows the modeler to 
take into account their uncertainty over the scale upon which to conduct 
extreme value analysis. 

As a motivating example consider the following. In ocean engineering, 
significant wave height {Hg), defined as four times the standard deviation 
of displacement from mean sea level, is a measure of ocean energy. Un- 
derstanding of the extremes of this variable is vital for offshore structural 
design. However, one might equally wish to consider the extremes of the drag 
force induced by the waves on a fixed offshore structure, a variable which 
is proportional to the square of Hg [Tromans and Vanderschuren (1995)]. 
Although the two variables are measurements of the same physical process, 
differing conclusions may be derived concerning their tail behavior. For the 
wave height data to be considered in Section 4, a simple likelihood-based 
analysis of weekly maxima of Hg produces a 100-year return level estimate 
and 95% confidence interval of 14.66 meters (13.63, 16.35). However, ana- 
lyzing Hg instead, then back-transforming the results to the Hg scale, the 
estimate becomes 16.27 meters (14.51, 18.92). Furthermore, the estimated 
shape parameters of the two variables differ markedly: for Hg, ^ = —0.12 
(-0.17, -0.06), whereas for H^, | = 0.11 (0.04, 0.19). These results suggest 
light-tailed behavior with a finite upper end point for Hg, yet heavy-tailed 
behavior with no finite upper end point for Hg. Such a situation gives rise 
to increasingly discrepant return level inferences with lengthening return 
period. It seems natural therefore to account for this uncertainty over the 
scale on which to extrapolate as part of the inference. 
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We approach this problem by incorporating a power transformation into 
the inference procedure; specifically, we use the well-known Box-Cox trans- 
formation [Box and Cox (1964)]. This transformation offers the possibility of 
improving the rate of convergence to the limiting extreme value form, since 
different distributions converge at different rates. This type of transforma- 
tion restricts the methodology to cases where the extreme data are strictly 
positive, however, this encompasses a wide variety of practical problems. 
Use of the Box-Cox transform has been previously considered by Teugels 
and Vanroelen (2004) as a way of improving the rate of convergence, and 
in Section 2 we discuss how part of our work is related to theirs. However, 
their work is purely probabilistic and, unlike ours, does not extend to con- 
sider use of the theory as a statistical technique. The use of the Box-Cox 
transformation in extreme value analysis has also been considered in an en- 
tirely different context in the work of Eastoe and Tawn (2009). In their work 
the motivation was the standardization of nonstationary data prior to the 
consideration of extreme values. 

We choose to adopt a Bayesian methodology for our inferential proce- 
dures, proceeding via Markov chain Monte Carlo (MCMC). The Bayesian 
framework allows us to produce particularly useful posterior summaries in- 
corporating uncertainty from both the data and the parameters. In partic- 
ular, it enables the calculation of a posterior predictive distribution, which 
provides a single useful summary of the likelihood of future extremes under 
the two stated sources of uncertainty [Coles and Tawn (1996)]. 

Moving from the usual three parameter extreme value models to four pa- 
rameter models including a Box-Cox parameter necessitates a reparameter- 
ization. The theory we exploit to derive our reparameterization is presented 
in Section 2, including a discussion on the rate of convergence. In Section 3 
we outline our reparameterizations and discuss associated inference meth- 
ods. In Section 4 we illustrate the methodology on simulated data and the 
aforementioned significant wave height data. A discussion of the work and 
outstanding issues is given in Section 5. 

2. Theory. 

2.1. Asymptotic and penultimate theory. Suppose are inde- 

pendent and identically distributed according to a probability law with dis- 
tribution function Fx, with density fx- In what follows it will be assumed 
that Fx{x) is twice differentiable for all sufficiently large x. Let Y denote 
these random variables after the application of a Box-Cox transformation; 
that is, Y = {X^ - 1}/A, A € M, the case A = taken as Y = \ogX, with 
distribution function Fy and density fy- Define Mx,n = maxjXi, . . . 
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The extremal types theorem [Fisher and Tippett (1928)] states that if there 
exist sequences of constants {a.Y,n > 0}, {bx,n} such that as n ^ oo 

(2.1) p/^ Mx,n-bx.n ^ A^^^^^ 

for some nondegenerate hmit distribution G{x), then G is necessarily of a 
generalized extreme value type. The symbol '— denotes weak convergence 
of the distribution functions. 

Let {ax,n}, {bx,n} henceforth specifically denote the normalizing sequences 
which lead to a GEV(0, l,£,x) limit distribution for the Mx,n- Smith (1987) 
shows that the sequences {ax,n}j {bx,n}, and the shape parameter ^x can 
be found as follows. Let hx{x) = {1 — Fx{x)}/ fx{x) denote the reciprocal 
hazard function of the parent distribution Fx ■ Then 

(2.2) bx,n = F^\l-l/n), ax,n = hx{bx,n), ix = ^^^\h'x{x) 

with = sup{x:Fx(x) < 1}, that is, the upper end point of the distri- 
bution. A finite value for ^x given by limit (2.2) and our assumptions on 
Fx are sufficient for weak convergence (2.1) and necessary and sufficient for 
both convergence of the densities and derivatives of the densities to those 
of the limiting extreme value form [Pickands (1986), Theorem 5.2] and we 
assume this applies throughout. 

The usual premise of extreme value modeling is to assume that the limit- 
ing form (2.1) holds exactly for some finite n. Fisher and Tippett (1928) and 
Smith (1987) propose an approximation of limit (2.1) by Mx,n ~ 
ni^x,niS,x,n) with ^x,n — h'-^(bx,n), referred to as the penultimate 
approximation to the shape parameter. Prom (2.2) we see = limn-5>oo lx,n ■ 
For inference purposes we therefore assume a three parameter model Mx,n ~ 
GE'V{l3x,ax,'yx), where we reserve the notation ^x for the limiting shape 
parameter. The proposal of this paper is to generalize this modeling assump- 
tion to 

MY,n = ^^ ~GEV(/5y,ay,7y), 

thereby incorporating a form of parametric scale uncertainty into the in- 
ference procedure. This gives a four parameter extreme value model, with 
canonical parameterization {/3y , ay , 7y , A}. The complex nature of the rela- 
tionships between these parameters, however, makes direct inference practi- 
cably infeasible (see Figure 2 in Section 4 for an illustration). Thus, a repa- 
rameterization to obtain more orthogonal relationships is necessary. Our 
strategy for orthogonalization relies upon obtaining {aY,n} , {bY,n} , CY,n in 
terms of the associated quantities for the original X variables. 
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Theorem 1. Under the conditions such that convergence (2.1) holds, 
with norming sequences {ax,n}^{bx,n} producing the GEV(0, limit, 
then 

V a,Y,n J 

holds for some finite when 

i^-oj OY,n = ^ , O'Y,n = aX,n[0x,n) 

Furthermore, if Fx is twice differentiahle for sufficiently large x, then the 
limiting shape parameter ^y takes the form 

(2.4) iY = ix+ lim ^^(A-1) 
with the penultimate approximation to this being given by 

(2.5) ^y,n = ex,n + ^(A-l). 

OX,n 

For any such distribution which has ^x ^ 0, then £,y = ix- 

See the Appendix for a proof. Equations (2.3) and (2.4) are used in Sec- 
tion 3 to motivate reparameterizations for the statistical models for block 
maxima and threshold exceedances. Note that when Fx is in the domain of 
attraction of a Negative WeibuU or Gumbel limit, then Fy is in the same 
domain of attraction; only those distributions which have a Frechet limit 
can be coerced into a different domain. However, as we are never practically 
in the limit, and hx{x)/x > for x > 0, values of A other than 1 will alter 
the penultimate approximation and thus change our practical estimation of 
the shape parameter for the transformed variables regardless of domain of 
attraction. 

2.2. Rate of convergence. It was noted in Section 1 that the rate of con- 
vergence to the limiting extreme value distribution may be altered by a 
power transformation. In Teugels and Vanroelen (2004) the theory of reg- 
ular variation is exploited to show what the optimal values of the power 
transformation parameter should be to maximize the rate of convergence 
in the case where ^ 0. Under our assumptions on Fx, we derive similar 
limiting results to Teugels and Vanroelen (2004) for ^x ^ 0, but also con- 
sider the case < and the penultimate approximations. In particular, the 
examples studied in Teugels and Vanroelen (2004) satisfy our assumptions. 

We use approximations developed by Smith (1987) as a basis for discussion 
on rates of convergence. Smith shows that for h'j^ ^ one may write 

(2.6) {Fx{ax,nx + bx,n)r = exp{-[l + h'j,{z)x]-'/'^'x(^^} + 0{n-') 
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for some z e [min{ax,na:; + fex,n, ^x,n}, max{ox,n2; + fex,n,fex,n}]- For /i^ = 
the first term on the RHS is e~^. It follows that the rate of pointwise 
distributional convergence is 

max{0 (I /i^,, (6x,n ) - I ) , O ( I /iS^ (ax,n^ + 6x,n ) - /ix (fcx.n ) I ) , O (n- 1 ) } . 

We focus on demonstrating how an improved rate of convergence is possible 
when this rate is equal to 0(\h!-^(hx^n) — This will in fact be the case if 

0({/ix(&x,n)r4^'^(6x,n)) < 0(|/i^(6x,n) " CD, Vr > 1. This is a condition 
satisfied by a wide range of theoretical examples, including Examples 1- 
4 below. For such distributions an improved rate of convergence will be 
achieved if 0(|/iy(6y,n) — Cy|) < 0(|/i^(6x,n) — By expressions (2.2), 
(2.4) and (2.5), 

(2 7) ~ ^'^^ 

\'Abx,n) + ^^^(A - 1) - ^mUix) + ^(A - 1)| 

bx,n x^xP y X J 

hxibx,n)/bx,n - lim^^^F hx{x)/x 



(2.8) =\h'x{hx,n)-ix 



1 + (A-1) 



h'x{bx,n) -^x 



Equation (2.8) demonstrates accelerated convergence under the transforma- 
tion if the second term on the RHS improves the order. This is the case for 
any value of A which gives convergence of this second term to 0. In partic- 
ular, this means there is a sequence of A values, denoted {A*} and given 
by 

(2.9) A*~l h'xibx,n)-^x ^ 

" hx{bx,n)/bx,n -li'oa^^^F hx{x)/x' 

which provide the best rate of convergence under any such transformation. 

For statistical applications the convergence rate of densities is also rel- 
evant. Pointwise density convergence entails an additional error term of 
0{\hxiax,nX + bx,n) /hx{bx,n) — (1 + ixx)\). The conditions on hx which 
allow us to consider 0{\h'-^(bx,n) —£,x\) for distribution function convergence 
give that 

0{\hx{ax,nx + bx,n)/hx{bx,n) - {l+ixx)\) = 0{\h'x{bx,n)-ix\)- 

For each of our variety of examples below pointwise density convergence oc- 
curs at the same rate as distribution function convergence, and any A which 
improves the rate of distribution function convergence also improves that of 
the density function. We can of course never check any of these conditions 
in practice, and it is our data rather than any theoretical knowledge which 
point to a value of A; as such, we presume that by pursuing this approach 
we at least do not lose in terms of convergence rate of the densities. 
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Below we provide illustrations for four different classes of distribution, 
largely following the examples laid out in Smith (1987). We make the cor- 
responding assumptions that the relationships in Examples 1-3 are twice- 
differentiable, in the sense that we can differentiate term-wise without af- 
fecting the 0-term representation. Table 1 summarizes the shape parameters 
for these examples, alongside the order of convergence of the penultimate 
approximations. Also detailed are values of A, denoted A*, which provide an 
improved rate of convergence. Note that these values are the limiting values 
of the sequence {A* }, where such a limit exists. 

Example 1. = +oo; a, /3,e,C > 0;D eR, 

1 - Fx{x) = C7x-"{1 + Dx-^ + 0{x-^-')} as X ^ x^ . 

This class belongs to the Frechet domain of attraction. Examples include the 
Pareto, t, F and Cauchy distributions. If Z) 0, then taking A* = /3 forces 
the leading term in |^y,n — Cy| to vanish, thus improving the convergence 
rate. 

Example 2. < +oo; a,/3,e,C > 0;!) e M, 

1 - Fx{x) = C{x^ - x)"{l + D{x^ - x)!^ + 0{{x^ - x)'^+')} 

as X ^ x^ . 

This class belongs to the Negative Weibull domain of attraction. Examples 
are distributions with bounded upper tails, such as the beta, along with 
various truncated distributions. Depending on the value of /3, the best rate 
of convergence is either given by A* = 1 (/3 > 1), or if /? < 1, the value of A 
is asymptotically inconsequential, and in this case the sequence {A* } has no 
limit. 

Examples, x-^ = +oo; a > -l;e > 0; C > 0, 

hx{x)= ^~/^[''^ =Cx-"{l + 0(x~")} asx^x^. 
Jx{x) 

This class belongs to the Gumbel domain of attraction. Examples include 
exponential (a = 0), normal (a = 1), Weibull (a = 7 — 1, for Weibull shape 
parameter 7) and gamma (a = 0). Taking A* = a + 1 improves the rate of 
convergence, again via elimination of the leading order term in \^Y,n — ^y\- 
In particular, note that for the normal distribution A* = 2 leads to faster 
convergence, the rate being improved from 0((logn)~^) to 0((logn)~^). 
More generally for sub-asymptotic levels, when (2.9) is used to obtain the 
appropriate sequence. A* 2 as n — >■ c». This example is revisited in Sec- 
tion 4.1.2. 
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Table 1 

Shape parameters and the leading order terms from the penultimate approximations for 
Examples 1-4. Three subcases for Example 4; (i) 7 < 0, (ii) 7 = 0, (iii) 7 > 



Example 




^X,n — ^X 






A* 


1 


1/q 




A/a 




P 


2 


-1/q 




-1/a 




1 


3 










~^C(A-(a + l))6-"-^ 


a + 1 


4(i) 


7 


P{n^) 


7 


XPin-') 





4(ii) 







\p 





None 


4(iii) 


N/A 


N/A 


y 


Qt 






t If and only if A = 0. 

Example 4. = +00 if 7 > 0, otherwise = e^~^l^-^ /? > 0; 7, n e M, 

-1/7 



\-Fxix) 



7 



This corresponds to the class of log-Pareto distributions [Cormann and Reiss 
(2009)]. For this class lim2._^^F h'-^{x) does not exist if 7 > 0; in this case the 
distribution is considered 'super-heavy-tailed' and falls into the domain of 
attraction of an extreme value distribution if and only if the Box-Cox pa- 
rameter A = 0. This provides the most well-known example of a distribution 
function outside any domain of attraction: 1 — Fx {x) = 1/ log(a;) ,x>e, when 
7 = /3 = li = 1. 

When lim^^^F h'-^{x) does not exist, (2.8) and (2.9) lack meaning, and 
one may revert to (2.7) to investigate whether any value of A which forces 
the existence of liuiy^yF hyiy) can be found. Direct consideration of hyiy), 
in this case writing x in place of bx,n, yields 

f3 + 7(logx — n) + 7 — (A — 1) lim {/3 + 7(logx — u)}, 
the limit of which can be seen to exist if and only if A = 0. 
3. Methodology. 

3.1. Models. The modeling setup we introduce for block maxima is 

Mx,n ~ GEV(/3x,«x,7x), My,„ = ~ GEV(/3y,ay,7r). 

This provides parameter sets Ox = {f3x,Cix,7x} and Oy = {/3y, ay,7y, A}. 
In particular, the shape parameter 7y is our finite sample approximation to 
^,Y,n, the penultimate approximation to the limiting shape parameter ^y . Es- 
timation of the parameter set 0y directly is unwieldy. This is caused by the 
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strong dependence introduced through the additional parameter A, as ex- 
hibited in the norming constant and penultimate approximation expressions 
of equations (2.3) and (2.4); see also Figure 2 in Section 4.1. Our approach 
to reducing the dependence among the parameter set is described in the 
following section. 

The above description pertains specifically to the GEV model, however, 
a common alternative to the block maxima approach in extreme value anal- 
ysis is to model all data which exceed some high threshold. The two mod- 
eling strategies employed for this purpose are (i) model exceedances via the 
generalized Pareto distribution [Davison and Smith (1990)], or (ii) model 
exceedances using a non-homogeneous Poisson process [Pickands (1971)]. 
Case (i) is essentially a reformulation of case (ii), so we discuss here only the 
latter approach. The formal asymptotic justification for the Poisson process 
model is that if we have a sequence of two-dimensional point processes 

Xj — bx,n 

ax,n 'n + l^ 

then on {xp,oo) x (0, 1), where x*p = lim.n^oo{xF — bx,n\ / cix,n with xp = 
inf{x : Fx{x) > 0}, P, a Poisson process with intensity measure 

A{(2;,oo) X (a, 6)} = {b - a){l + ^xx)^^^^"" , < a < b < l,x*p < x < oo. 

The normalizing constants {ax,n},{bx,n} and the shape parameter ^x are 
exactly as before, thus, for statistical inference on un-normalized data we 
model using a three parameter nonhomogeneous Poisson process, denoted 
PF{f3x,Cix,lx), with intensity measure 

(3.1) A{(x,oo) X (a,5)} = (a-6) 



Pn = < \ r I -.1 = 1 n 



l + ^(x-/3x) 
ax 



This parameterization is easily unified with that of the GEV. If observed 
data correspond to a particular number of blocks Nb, then to estimate the 
GEV parameters corresponding to these block maxima, {/3^,a^,7^}, one 
assumes Nb independent replications of the Poisson process with a = 0, 
6 = 1. Thus, the statistical model becomes a Poisson process with intensity 
measure 

(3.2) Nb l + -^{x-P'x) 

'^x J + 

The relation between the parameters in (3.1) with a = 0,6 = 1 and (3.2) are 
given by 

ix=lx = l, ^'x = ^x- — {'^-{-^r \ Y olx = OLx{ ^ 



7 V \^b) J \Nb 
(3.3) 

Both GEV and point process methods are considered in our examples in 
Section 4, where the key focus of our modeling is return level inference. 
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3.2. Reparameterization. When fitting a GEV(/3y, ay,7y) distribution 
to My^n, the parameters {/3y,ay} are the unknown quantities {bY,n,ciY,n}- 
This is a direct consequence of ay,n, fey,n being specifically the sequences 
which give a GEV(0, l,Cy) limit distribution for My^„. Therefore, Theorem 1 
leads to the reparameterizations 

(3.4) Py = ^^^, logay = (A-l)log/3x + logax, 

the log function being used in the latter to both ensure the positivity con- 
straint is respected and to linearize dependence. For 7y the situation is 
slightly more subtle. Equation (2.5) suggests taking 

(3.5) ^y=Jj, + ^{X-l). 

PX 

However, recalling equation (2.6), one can see that the estimable value of the 
shape parameter will not in general be /iy(&y,n)5 but rather closer to being 
^y(^y,n + £); for some unknown e. Thus, the parametric form (3.5) which is 
motivated by equation (2.5) is not strictly appropriate, and the discrepancy 
between 6y^„ and 6y^„ + e can be sufficiently large that the structure (3.5) 
is a poor choice. This presents a problem finding a satisfactory theoretical 
solution to the ratio in expression (3.5) which multiplies A — 1. 

To overcome this, we have adopted the pragmatic solution of setting 

(3.6) jY = lx + c{X-l), 

where c is a fixed value estimated prior to inference. As equation (3.6) cor- 
responds to a linear relationship between A and 7y, we used the shape of 
the profile likelihood for {7y,A} to identify the gradient of the relation- 
ship. We estimate c via calculating the profile (log-)likelihood, P£(7y, A) on 
a fine grid and performing a weighted least squares fit to the grid points 
in order to extract this slope. The weights are chosen at {7y,A} to be 
exp[— 2{P£(7y , A) — P^(7y,A)}], thus ensuring that the ridge of high like- 
lihood dominates the fit and reduces sensitivity of the resulting estimate to 
the choice of grid. Note that the calculation of P£(7y,A) over a particular 
region of interest presents no difficulties, but full inference from the likeli- 
hoods for By is infeasible. This two-step approach to the reparameterization 
has proven to work well in practice. 

3.3. Inference. The likelihood functions for a general GEV(/3,a,7) dis- 
tribution and PP(/3,a,7) above a threshold u are given for m independent 
and identically distributed data points by 

LGEv(/3,a,7) 

(3.7) 
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JJexp 



i=l 



(3.8) 



Lpp(/3,a,7) 



exp-^ -Nb 



a 



l + ^(n-/5) 
a 



-1/7 



l + ^(a;,-/3) 
a 



-1/7-1 



-1/7^ m 



i=l 



l + ^(x,-/3) 



-1/7-1 



respectively, with {xi} representing realized block maxima and threshold 
exceedances in equations (3.7) and (3.8) respectively. To extend these like- 
lihoods to the 4 parameter case simply requires that replaced by 
{u^ — 1}/A, {x^ — 1}/A, and that each term in the product is multiplied by 
the Jacobian xf~^. In what follows, reference to a '3 parameter model' re- 
lates directly to traditional extreme value models whose likelihoods are given 
by equations (3.7) and (3.8). Reference to a '4 parameter model' pertains to 
our extension. 

Equations (3.4) and (3.6) represent our reparameterizations of Oy in terms 
of a new set of parameters {/3x ,logax,7X; A}. As the first three link clearly 
to inference for Mx,n, this allows selection of good choices for parameter 
starting values by commencing initially with a 3 parameter fit. In our al- 
gorithms vague Gaussian priors (variance 10,000, centered on the estimates 
from the 3 parameter fit) and Gaussian random walk sampling are used for 
/3x ) log ax ) 7x ) and a uniform prior with independent sampling for A. The 
parameter range for A is informed by inspection of the profile likelihood 
P£(7y,A). 

The algorithm includes the constraint that if A < 0, 7y < 0, since the 
former implies a finite upper end point to the distribution, which is only the 
case when the latter also holds. Furthermore, in the case A < this upper 
end point is {(x^)^ — 1}/A < — 1/A, thus, we also impose the constraint that 
the upper end point of the fitted GEV is /3y — ay/'jY < —1/A. 

It was found that setting A^b ~ "i, the number of threshold exceedances, 
in equation (3.8) improved the mixing properties of the chain. This presents 
no major difficulties since the equations in (3.3) demonstrate how param- 
eters corresponding to different numbers of assumed blocks are linked. A 
reason for improved mixing under this adjustment is that for n total obser- 
vations and m exceedances the location parameter /? becomes the 1 — m/n 
quantile of the true underlying distribution Fx- However, our fixed and 
known threshold u is the empirical estimate of this quantile, hence, this 
choice orthogonalizes the relationship between /3 and the other parameters. 

The output of the MCMC leads to inference on return levels through 
posterior distributions on specific quantiles, and via the posterior predictive 
distribution. The 1/p block return level, x^/p, which is the 1—p quantile of 
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the distribution is found via 

(3.9) 2;i/p = [Ayi/p + l]i/\ 

where 

yi,, = PY-— [I- {-\og{l- p)r^-]. 

lY 

The \/p block posterior predictive return level, denoted Xi/p, which corre- 
sponds to the \ —p quantile of the posterior predictive distribution for Mx,n, 
is found by numerically solving 

P{Mx,n<Xyp\x) = j P(My,„<{xt/p-l}/A|6'y)p(6'y|x) d0y = 1 - p, 

where x represents the realized data, either in block maxima or threshold 
excess form. In practice, this is approximated through a discrete integral 
over the MCMC output for Oy. 



4. Examples. 



4.1. Simulated data examples. Two examples are presented. The first il- 
lustrates behavior when an exact extreme value distribution is recoverable 
through a power transformation. The second presents the case of the nor- 
mal distribution, demonstrating the practical effect of the differing rates of 
convergence for transformed and untransformed variables. For each example 
the burn-in period was 1000 iterations, with our reported analyses based on 
the subsequent 10,000 draws. 

4.1.1. P re-transformed extreme value model. Data were simulated from a 
nonhomogeneous Poisson process with parameters {/?, 0,7} = {15, 1.5, —0.25} 
and the threshold u was fixed by the parameters so that A{(n, 00) x (0, 1)} = 
100,000. The data were generated on the basis of 1000 blocks, that is, taking 
A'^B hi (3.8) to be 1000. Three sub-samples of these data were analyzed: 

1. Block maxima: 1000 maxima taken of blocks of length 100. These data 
are exactly GEV(15, 1.5, -0.25) distributed. 

2. Largest 1000 data: threshold selected to retain only the largest 1000 
points. Owing to the threshold stability property of the Poisson process, 
these still have a PP(15, 1.5, —0.25) distribution. 

3. All data exceeding the smallest block maximum: threshold selected to 
be equal to the minimum data point in data set 1. This gave 6847 data 
points. Again these are PP(15, 1.5, —0.25) distributed. 

As a testing ground for the ability of the methodology to detect a 'true' 
value of A when one exists, a square transformation was pre-applied to data 
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Fig. 1. Transformed extreme value model example: posteriors for X from (a) data set 1, 
prior range [—2,3]; (b) data set 2, prior range [—2,8]; (c) data set 3, prior range [0,2]. 



sets 1, 2 and 3, thus, they no longer fohowed the exact extreme value distribu- 
tions from which they were generated; these distributions being recoverable, 
up to location and scale shifts, by taking A = 0.5. 

Figure 1 displays the posterior distributions for A in each of the three 
scenarios. The ranges of the uniform priors for A are detailed in the caption. 
Ivlodes around A = 0.5 are detectable in (a) and (c) (data sets 1 and 3), with 
the latter being much the more concentrated density. The least information 
on A is obtained from data set 2. This is explained by the relative extremity 
of the data. The more extreme the data, the more the standard asymptotic 
convergence arguments apply. That is, with data set 2, in particular, the 
process is approximately Poisson regardless of the transformation since we 
are still considering the largest 1% of a sample which is in the domain of 
attraction of an extreme value distribution. Data set 3 contains a larger 
amount of data, with the additional data being less extreme than that of 
data set 2, thus producing the most informative posterior. 

Figure 2 displays the pairwise empirical posteriors from the MCIVIC out- 
put. The first two rows exhibit pairs from the new parameters {/3x) log(ajis:), 
7x,A}, while the bottom two rows present the implied posteriors for the 
original parameter set {/3y , ay , 7y , A}. It is clear from these figures that no 
meaningful inference could be performed without the reparameterization. 

4.1.2. Normal distribution. The data simulated were 100,000 truncated 
(at 0) N(0, 1) variables, that is, such that Fx{x) = 2^{x) - 1,2; > 0. As in 
the example of Section 4.1.1, three data sets were obtained from these: 

1. Block maxima: 1000 block maxima taken over block length 100. 

2. 1000 largest data points. 

3. All data points above the smallest block maximum. There were 8066 such 
points. 
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Fig. 2. Transformed extreme value model example: pairwise empirical posteriors for the 
new parameters {jSx , log ax , Jx , A} ( top two rows ) and the implied posteriors for the orig- 
inal parameters {Py , ctY , JY , ^} (bottom two rows). 



Figure 3 presents the posteriors for A in each case. The pattern of infor- 
mation contained on A from each data set is similar to the previous example, 
for the reasons formerly described. In Figure 3(a) there is a mode just be- 
low A = 2, in (c) the peak is around A = 1.5. These values fit well with the 
theory. The location normalizing constant for the truncated normal distribu- 
tion is bx,n ~ (21og7T,)^/^ — (1/2) X (21og?i)~^/^(log7r + loglog?i) ?s2.6 when 
n= 100. At this sub-asymptotic level, the value of A* from (2.9), using the 
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Fig. 3. Truncated normal example: posteriors for A, for (a) data set 1, prior range 
[—1,6]; (b) data set 2, prior range [—3,15]; and (c) data set 3, prior range [0,3]. 



first four leading terms in hx/x and h'^ is 1.86. For the third data set we 
are at an even lower asymptotic level. Here, replacing hx,n in the calcula- 
tion with the threshold, 1.75, gives A* = 1.48. Both of these agree with the 
evidence in the posterior for A. 

Figure 4 displays the relative return level summaries derived from the 
analysis, with reference to the true return level curve calculated by solv- 
ing {Fx{xiip)^^^^ = 1 — p. Posterior return level summaries are displayed 
pointwise, while the posterior predictive distributions are given as curves. 
In Figures 4(a) and (c) it can be observed that the 3 parameter model pro- 
duces biased estimates of the return levels, the true value falling far outside 
the posterior credible interval. In Figure 4(b) the true value is just covered 
by the interval. These results are an indication of the very slow convergence 
of the Normal distribution to the extreme value limit. From the posteriors 
for A there is certainly evidence that accelerated convergence is obtained 
from the 4 parameter model. The bias in return level estimation compared 
to the 3 parameter case is reduced, but has not disappeared. The true val- 
ues of the return level lie within each of the credibility intervals for the 4 
parameter models. This is in part down to the faster convergence, although 
the extra uncertainty involved plays a role as well. 

4.2. Wave example. The data are measured significant wave heights (Hg) 
for an unnamed location in the North Sea. There were just over 33 years 
of measurements available, with 8 measurements per day recording Hg over 
continuous 3 hour time periods. Our analysis is restricted to a single season 
to ensure approximate stationarity, taking the winter period (13 weeks be- 
ginning on 1 December each year), as this generally represents the period 
when almost all extreme events arise. We again examined the data in three 
ways: 
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Fig. 4. Truncated normal example: posterior and posterior predictive relative return level 
summaries for (a) data set 1, (b) data set 2, (c) data set 3. The solid hold line at 1 is the 
reference point for the true return level based on the truncated normal cdf; '3', '4' denote 
the relative posterior median return levels of the 3 and 4 parameter models respectively; 
dashed/solid vertical lines: 3/4 parameter model 95% credibility interval; dashed/solid con- 
nected lines: 3/4 parameter model posterior predictive return levels. 



(i) Weekly maxima, corresponding to a block size of 8 x 7 = 56 observa- 
tions. There were 433 data points in total. 
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Fig. 5. Hs data example: (a), (b), (c) profile log-likelihoods for {7y-, A}, with contours at 
levels of —1, —3, —5, —7, —12 below the maximum log-likelihood; (d), (e), (f) posteriors 
for A for analyses (i), (ii) and (iii) respectively. Prior ranges for A taken as (i) [—1,4], 
(ii) [-2,15], (iii) [0,5]. 



(ii) Cluster maxima above an 80% threshold. Runs method declustering 
[Smith and Weissman (1994)] was used, with a separation of 6 consec- 
utive sub-threshold values deemed to define a new cluster. There were 
562 data points. 

(iii) Cluster maxima above an 60% threshold, using the same declustering 
procedure as in (ii). There were 618 data points. 

In each case both the usual 3 parameter model and the appropriate proposed 
4 parameter model [GEV for (i), point process for (ii) and (iii)] were fitted. 
Our analyses are again based on 10,000 MCMC samples following a 1000 
iteration burn- in period. Figure 5 displays the profile likelihoods for {7y, A} 
and the posterior distributions of A in each scenario. As with the simulated 
data, there is more information on A for less extreme data, as evidenced 
by plots (d) and (f) compared with (e). It is interesting to note that for 
the 4 parameter GEV model, the slope c in expression (3.6) was estimated 
as 0.23, showing how the parameterization (3.6) ties in with the different 
shape parameters for Hg {jx = —0.12) and {jx = 0.11) mentioned in 
Section 1: 0.11 = -0.12 + 0.23 x (2 - 1). 
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Fig. 6. Hs data example: QQ plots for (a), (c), (e) 4 parameter model, (b), (d), (f) 3 
parameter model for analyses (i), (ii) and (iii) respectively. Dashed lines represent a 95% 
pointwise credible interval, formed from the central 95% of the posterior distribution for 
each quantile. 



Figure 6 displays QQ plots for each of the fits; here the 'fitted' quantile is 
defined to be the median of the pointwise quantile posterior distributions, 
that is, the median of Xi/p, for Xi/p given by (3.9). Each of the fits appears 
reasonable, and considering that A = 1 is plausible under each of the pos- 
teriors, this is perhaps not too surprising. However, in each case, there is 
some evidence that the very upper tail is modeled slightly better by the 4 
parameter model. 

Posterior summaries of the return levels from analysis (iii) are displayed 
in Figure 7, where increasing disparity of estimates with lengthening return 
period can be observed. The corresponding plots for analyses (i) and (ii) have 
been omitted for clarity, but show similar general trends with greater uncer- 
tainty for analysis (ii) and lesser for analysis (i). In particular, observe that 
the medians of the posterior return level distribution for the 100 and 1000 
winter return periods under the 4 parameter model lie into the upper tail of 
the same distributions under the 3 parameter model. From the motivating 
example in Section 1 it is clear why these discrepancies occur: the Hg data 
were estimated as light-tailed, with a statistically significant negative shape 
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Fig. 7. Hs data example: posterior and posterior predictive return level summaries for 
Hs , based on both 3 and 4 parameter models for analysis (iii) . Symbols and line types as 
in Figure 4- 



parameter (taking a 5% significance level); the data were estimated to be 
heavy-tailed, with a statistically significant positive shape parameter. Such 
different tail behavior will naturally lead us to different conclusions. The 
posteriors for A show that we might reasonably extrapolate on either scale, 
among other possibilites; the 4 parameter model combines all such plausible 
scenarios to build up what would appear to be a more accurate assessment 
of the uncertainty associated with these extrapolations. 

5. Discussion. The paper has presented a parametric method for incor- 
porating the uncertainty surrounding the scale of extrapolation in extreme 
value analysis. Reparameterizations which allow inference under the model 
have been derived, justified by the theory of normalizing constants for the 
limiting distribution of block maxima. Examples have demonstrated the abil- 
ity of the methodology to detect the 'true' value of A where one exists, for 
the case of finite block size/sub-asymptotic threshold. As either the block 
size tends to infinity or the threshold to the upper end point, information 
on A decreases, since there is little to be gained from a transformation. 

The fact that there may not always be significant information on A poses 
the question whether it is always necessary to incorporate this uncertainty. In 
Theorem 1 we noted that in the case where < with > 0, the shape pa- 
rameters ^Y,n Cx as the data become more extreme, since lim^^^F hx{x)/x = 
0. In such a case, where all our data are far into the upper tail, the mean 
squared error of the 4 parameter model is likely to exceed that of the 3 
parameter case. An ill-determined posterior for A may be one indication 
that utilization of this method adds an unnecessary degree of uncertainty. If 
the variance of the posterior seems unacceptably large, then the suggestion 
would be that the data do not contain information on A, in which case the 
practitioner may consider not using this method. 
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At the other end of the scale, the fact that suitable values of A may accel- 
erate convergence offers the potential for incorporating more data through 
lowering of the threshold or contracting of block length. Although we have 
not specifically explored this here, examples such as the normal data exam- 
ple given in Section 4 demonstrate how this could be worthwhile. Because 
QQ plots such as those in Figure 6 are easily obtained under both 3 and 4 
parameter models, the modeler should be able to determine if there is value 
in doing this. 

A natural question that arises is whether to consider fixing A if there is 
strong evidence for a particular value in the posterior. As outlined in Sec- 
tion 2, there are cases where a specific value will accelerate convergence, 
thus, one could assume that the modal value is a suitable one to take. How- 
ever, the reason that we have a full posterior distribution is that there is 
genuine uncertainty in this value. Arguably, therefore we mitigate against 
our errors by keeping this uncertainty. This seems unsatisfying in a world 
where we value precision in our estimates, but if uncertainty genuinely exists, 
it should not be masked by the pursuit of false precision. 

The Box-Cox class of transformations is suitable only for strictly positive 
data. In the event that interest lies in a data set for which this is not the 
case, a location shift prior to analysis would be necessary. One might also 
in such a case consider different classes of transformation. Cormann and 
Reiss (2009), for example, consider exponential transforms. From our proof 
in the Appendix it is simple to derive reparameterizations for any monotonic 
transformation, thus, one could exploit this theory in other contexts. 



Denote the transformation y{x) = {x — 1}/A and the inverse transforma- 
tion x{y) = {Ay + 1}^/'^. The distribution function Fy is given by 



Fviy) = P{Y <y) = P{X < {Xy + 1}^/^) = Fx{{Xy + 1}^/^) = Fx{x{y)). 



Denote the Jacobian of the transformation and inverse transformation by 



APPENDIX: PROOF OF THEOREM 1 



Therefore, solving -Fy(&y,n.) = 1 — 1/n for by^n yields 
Fx{{Xbyn + lV/^) = l-l/n, 

{Xby,n + 1}'/^ = - l/n) = bx, 






dx 



dy 




l/A-l 
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These are linked by Jy(y) = {Jx{x{y))}^^ . The reciprocal hazard function 
hy is 

hy{y) = ^-^y(y) = l-Fx{x{y)) ^ hx{x{y)) 
friy) fx{x{y))JY{y) Jviy) 

which gives 

fex({Aby,n + l}^/^) ^ hx{hx,n) 
{A6y,„ + l}l/A-l {hx,nY-^ 



u fu \ "A Vl^>'^l',n T J-/ ; "'AV'^A.n; \A-1 



To obtain an expression for the shape parameter, we require the derivative 
of the reciprocal hazard function for y, 



h'Y{y) = ^m^] 

dy\ JY{y) J 



My)^hx{x{y)) - hx{x{y))^JY{y) ] /JY{y? 



By the chain rule, 



±^hx{x{y)) = JY{y)h'My)) 



and 



Thus, 



JY{y) = JY{y) 



d 1 _ J'^xiy)) 



dxJx{x{y)) Jx{x{y)f 

h'Yiy) 



JY{y?h'^{x{y)) hx{x{y))J{.{y) 



My? My? 

= h'My)) + hx{x{y))^-f^y 

Jx{x(y)) 

Substituting in Jx{x) = x'^"-^, J'xix) = (A — results in 

h'yiy{x)) = h'Ax) + ^^i\-l). 

Substituting in x = bx,n gives (2.5); taking the limit as x — )■ gives (2.4). 

For the final statement, = ^^^^x^x^ ^'xi-^) — ^ implies that 
lim^^^F hx{x)/x = 0. 
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